Bayes-Adaptive POMDPs
نویسندگان
چکیده
Bayesian Reinforcement Learning in MDPs: MDP: (S,A, T,R) • S: Set of states •A: Set of actions • T (s, a, s′) = Pr(s′|s, a), the transition probabilities •R(s, a) ∈ R, the immediate rewards Assume transition function T is the only unknown. •Define prior Pr(T ) •Maintain posterior Pr(T |s1, a1, s2, a2, . . . , at−1, st) via Bayes rule. •Act such as to maximize expected return given current posterior and how it will evolve.
منابع مشابه
Learning in POMDPs with Monte Carlo Tree Search
The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation an...
متن کاملBayes-Adaptive Interactive POMDPs
We introduce the Bayes-Adaptive Interactive Partially Observable Markov Decision Process (BA-IPOMDP), the first multiagent decision model that explicitly incorporates model learning. As in I-POMDPs, the BA-IPOMDP agent maintains beliefs over interactive states, which include the physical states as well as the other agents’ models. The BA-IPOMDP assumes that the state transition and observation ...
متن کاملExploration in POMDPs
In recent work, Bayesian methods for exploration in Markov decision processes (MDPs) and for solving known partially-observable Markov decision processes (POMDPs) have been proposed. In this paper we review the similarities and differences between those two domains and propose methods to deal with them simultaneously. This enables us to attack the Bayes-optimal reinforcement learning problem in...
متن کاملMessage-passing algorithms for large structured decentralized POMDPs
Decentralized POMDPs provide a rigorous framework for multi-agent decision-theoretic planning. However, their high complexity has limited scalability. In this work, we present a promising new class of algorithms based on probabilistic inference for infinite-horizon ND-POMDPs—a restricted Dec-POMDP model. We first transform the policy optimization problem to that of likelihood maximization in a ...
متن کاملCOMP-627 Project Belief State Space Compression for Bayes-Adaptive POMDPs
Partially Observable Markov Decision Processes (POMDP) provide a nice mathematical framework for sequential decision making in partially observable stochastic environments. While it is generally assumed that the POMDP model is known, this is rarely the case in practice, as the parameters of the model must be finely tuned to reflect the reality as close as possible. Hence it is of crucial import...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007